Tutorial
This tutorial shows the functionalities of Extremes.jl. They are illustrated by reproducing some of the results shown by Coles (2001) in An Introduction to Statistical Modeling of Extreme Values.
Before executing this tutorial, make sure to have installed the following packages:
- Extremes.jl (of course)
- DataFrames.jl (for using the DataFrame type)
- Distributions.jl (for using probability distribution objects)
- Gadfly.jl (for plotting)
and import them using the following command:
julia> using Extremes, Dates, DataFrames, Distributions, GadflyModel for stationary threshold exceedances
The stationary ThresholdExceedance model is illustrated using the daily rainfall accumulations at a location in south-west England from 1914 to 1962. This dataset was studied by Coles (2001) in Chapter 4.
Load the data
Loading the daily rainfall at a location in South-England:
data = load("rain")
first(data,5)| Date | Rainfall | |
|---|---|---|
| Date… | Float64 | |
| 1 | 1914-01-01 | 0.0 |
| 2 | 1914-01-02 | 2.3 |
| 3 | 1914-01-03 | 1.3 |
| 4 | 1914-01-04 | 6.9 |
| 5 | 1914-01-05 | 4.6 |
Plotting the data using the Gadfly package:
set_default_plot_size(14cm ,8cm)
plot(data, x=:Date, y=:Rainfall, Geom.point, Theme(discrete_highlight_color=c->nothing))
Threshold selection
TODO
GPD parameters estimation
Let's first identify the threshold exceedances:
threshold = 30.0
df = filter(row -> row.Rainfall > threshold, data)
first(df, 5)| Date | Rainfall | |
|---|---|---|
| Date… | Float64 | |
| 1 | 1914-02-07 | 31.8 |
| 2 | 1914-03-08 | 32.5 |
| 3 | 1914-12-17 | 31.8 |
| 4 | 1914-12-30 | 44.5 |
| 5 | 1915-02-13 | 30.5 |
Get the exceedances above the threshold:
df[!,:Rainfall] = df[!,:Rainfall] .- threshold
rename!(df, :Rainfall => :Exceedance)
first(df, 5)| Date | Exceedance | |
|---|---|---|
| Date… | Float64 | |
| 1 | 1914-02-07 | 1.8 |
| 2 | 1914-03-08 | 2.5 |
| 3 | 1914-12-17 | 1.8 |
| 4 | 1914-12-30 | 14.5 |
| 5 | 1915-02-13 | 0.5 |
GP parameters estimation with probability weighted moments
The GP parameter estimation with probability weighted moments is performed as follows:
julia> fm = gpfitpwm(df, :Exceedance)
pwmEVA
model :
ThresholdExceedance
data : Array{Float64,1}[152]
logscale : ϕ ~ 1
shape : ξ ~ 1
θ̂ : [1.9877399514951732, 0.19651587232938317]The approximate covariance matrix of the parameter estimates can be obtained with the function parametervar:
julia> parametervar(fm)
2×2 Array{Float64,2}:
0.0157177 -0.00656502
-0.00656502 0.00683069GP parameters estimation with maximum likelihood
The GP parameter estimation with maximum likelihood is performed as follows:
julia> fm = gpfit(df, :Exceedance)
MaximumLikelihoodEVA
model :
ThresholdExceedance
data : Array{Float64,1}[152]
logscale : ϕ ~ 1
shape : ξ ~ 1
θ̂ : [2.006896498380506, 0.1844926991237574]The approximate covariance matrix of the parameter estimates can be obtained with the function parametervar:
julia> parametervar(fm)
2×2 Array{Float64,2}:
0.0165972 -0.00880429
-0.00880429 0.0102416GP parameters estimation with the Bayesian method
The GP parameter estimation with the Bayesian method is performed as follows:
julia> fm = gpfitbayes(df, :Exceedance)
Progress: 0%| | ETA: 0:18:49[K
Progress: 22%|█████████▎ | ETA: 0:00:02[K
Progress: 37%|███████████████ | ETA: 0:00:01[K
Progress: 40%|████████████████▍ | ETA: 0:00:01[K
Progress: 51%|████████████████████▉ | ETA: 0:00:01[K
Progress: 64%|██████████████████████████▍ | ETA: 0:00:01[K
Progress: 75%|██████████████████████████████▉ | ETA: 0:00:00[K
Progress: 89%|████████████████████████████████████▍ | ETA: 0:00:00[K
Progress: 100%|█████████████████████████████████████████| Time: 0:00:01[K
BayesianEVA
model :
ThresholdExceedance
data : Array{Float64,1}[152]
logscale : ϕ ~ 1
shape : ξ ~ 1
sim :
Mamba.Chains
Iterations : 2001:5000
Thinning interval : 1
Chains : 1
Samples per chain : 3000
Value : Array{Float64,3}[3000,2,1]Currently, only the improper uniform prior is implemented, i.e. \[ f_{(ϕ,ξ)}(ϕ,ξ) ∝ 1. \] It yields to a proper posterior as long as the sample size is larger than 2 (Northrop and Attalides, 2016).
Currently, the No-U-Turn Sampler extension (Hoffman and Gelman, 2014) to Hamiltonian Monte Carlo (Neel, 2011, Chapter 5) is implemented for simulating an autocorrelated sample from the posterior distribution.
The approximate covariance matrix of the parameter estimates can be obtained with the function parametervar:
julia> parametervar(fm)
2×2 Array{Float64,2}:
0.0161428 -0.00873375
-0.00873375 0.0106877Return level estimation
With the ThresholdExceedance structure, the returnlevel function requires several arguments to calculate the T-year return level:
- the threshold value;
- the number of total observation (below and above the threshold);
- the number of observations per year;
- the return period T;
- the confidence level for computing the confidence interval.
The function uses the Peaks-Over-Threshold model definition (Coles, 2001, Chapter 4) for computing the T-year return level.
For the rainfall example, the 100-year return level can be estimated as follows:
fm = gpfit(df, :Exceedance)
r = returnlevel(fm, threshold, size(data,1), 365, 100, .95)ReturnLevel
fittedmodel :
MaximumLikelihoodEVA
model :
ThresholdExceedance
data : Array{Float64,1}[152]
logscale : ϕ ~ 1
shape : ξ ~ 1
θ̂ : [2.006896498380506, 0.1844926991237574]
returnperiod : 100
value : Array{Float64,1}[1]
cint : Array{Array{Float64,1},1}[1]
where the value can be accessed with
julia> r.value
1-element Array{Float64,1}:
106.32558691303024and where the corresponding confidence interval can be accessed with
julia> r.cint
1-element Array{Array{Float64,1},1}:
[65.48163774428642, 147.16953608177405]In this example of a stationary model, the function returns a unit dimension vector for the return level and a vector containing only one vector for the confidence interval. The reason is that the function always returns the same type in the stationary and non-stationary case. The function is therefore type-stable allowing better performance of code execution.
To get the scalar return level in the stationary case, the following command can be used:
julia> r.value[]
106.32558691303024To get the scalar confidence interval in the stationary case, the following command can be used:
julia> r.cint[]
2-element Array{Float64,1}:
65.48163774428642
147.16953608177405Probability weighted moments estimation
Probability weighted moments estimation of the GEV parameters can also be performed by using the gevfitpwm function. All the methods also apply to the pwmEVA object.
julia> fm = gpfitpwm(df, :Exceedance)
pwmEVA
model :
ThresholdExceedance
data : Array{Float64,1}[152]
logscale : ϕ ~ 1
shape : ξ ~ 1
θ̂ : [1.9877399514951732, 0.19651587232938317]Bayesian estimation
Bayesian estimation of the GEV parameters can also be performed by using the gevfitbayes function. All the methods also apply to the `BayesianEVA object.
julia> fm = gpfitbayes(df, :Exceedance)
Progress: 12%|█████▏ | ETA: 0:00:01[K
Progress: 22%|████████▉ | ETA: 0:00:01[K
Progress: 36%|██████████████▋ | ETA: 0:00:01[K
Progress: 49%|████████████████████▏ | ETA: 0:00:00[K
Progress: 59%|████████████████████████▏ | ETA: 0:00:00[K
Progress: 73%|█████████████████████████████▉ | ETA: 0:00:00[K
Progress: 85%|██████████████████████████████████▉ | ETA: 0:00:00[K
Progress: 96%|███████████████████████████████████████▍ | ETA: 0:00:00[K
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00[K
BayesianEVA
model :
ThresholdExceedance
data : Array{Float64,1}[152]
logscale : ϕ ~ 1
shape : ξ ~ 1
sim :
Mamba.Chains
Iterations : 2001:5000
Thinning interval : 1
Chains : 1
Samples per chain : 3000
Value : Array{Float64,3}[3000,2,1]Model for dependent data
The stationary ThresholdExceedance model is illustrated using the daily rainfall accumulations at a location in south-west England from 1914 to 1962. This dataset was studied by Coles (2001) in Chapter 4.
Load the data
Loading the daily rainfall at a location in South-England:
data = load("wooster")
first(data,5)| Date | Temperature | |
|---|---|---|
| Date… | Int64 | |
| 1 | 1983-01-01 | 23 |
| 2 | 1983-01-02 | 29 |
| 3 | 1983-01-03 | 19 |
| 4 | 1983-01-04 | 14 |
| 5 | 1983-01-05 | 27 |
Plotting the data using the Gadfly package:
plot(data, x=:Date, y=:Temperature, Geom.point, Theme(discrete_highlight_color=c->nothing))
df = copy(data)
df[!,:Temperature] = -data[:,:Temperature]
filter!(row -> month(row.Date) ∈ (1,2,11,12), df)
plot(df, x=:Date, y=:Temperature, Geom.point)
Declustering the threshold exceedances
threshold = -10
cluster = getcluster(df[:,:Temperature], -10, runlength=4)julia> typeof(cluster)
Array{Cluster,1}GPD parameters estimation
Let's first identify the threshold exceedances:
threshold = 30.0
df = filter(row -> row.Rainfall > threshold, data)
first(df, 5)| Date | Rainfall | |
|---|---|---|
| Date… | Float64 | |
| 1 | 1914-02-07 | 31.8 |
| 2 | 1914-03-08 | 32.5 |
| 3 | 1914-12-17 | 31.8 |
| 4 | 1914-12-30 | 44.5 |
| 5 | 1915-02-13 | 30.5 |
Get the exceedances above the threshold:
df[!,:Rainfall] = df[!,:Rainfall] .- threshold
rename!(df, :Rainfall => :Exceedance)
first(df, 5)| Date | Exceedance | |
|---|---|---|
| Date… | Float64 | |
| 1 | 1914-02-07 | 1.8 |
| 2 | 1914-03-08 | 2.5 |
| 3 | 1914-12-17 | 1.8 |
| 4 | 1914-12-30 | 14.5 |
| 5 | 1915-02-13 | 0.5 |
Generalized Pareto parameter estimation by maximum likelihood:
julia> fm = gpfit(df, :Exceedance)
MaximumLikelihoodEVA
model :
ThresholdExceedance
data : Array{Float64,1}[152]
logscale : ϕ ~ 1
shape : ξ ~ 1
θ̂ : [2.006896498380506, 0.1844926991237574]The function returns the estimates of the log-scale parameter $\phi = \log \sigma$.
Return level estimation
With the ThresholdExceedance structure, the returnlevel function requires several arguments to calculate the T-year return level:
- the threshold value;
- the number of total observation (below and above the threshold);
- the number of observations per year;
- the return period T;
- the confidence level for computing the confidence interval.
The function uses the Peaks-Over-Threshold model definition (Coles, 2001, Chapter 4) for computing the T-year return level.
For the rainfall example, the 100-year return level can be estimated as follows:
julia> r = returnlevel(fm, threshold, size(data,1), 365, 100, .95)
ReturnLevel
fittedmodel :
MaximumLikelihoodEVA
model :
ThresholdExceedance
data : Array{Float64,1}[152]
logscale : ϕ ~ 1
shape : ξ ~ 1
θ̂ : [2.006896498380506, 0.1844926991237574]
returnperiod : 100
value : Array{Float64,1}[1]
cint : Array{Array{Float64,1},1}[1]where the value can be accessed with
julia> r.value
1-element Array{Float64,1}:
106.32558691303024and where the corresponding confidence interval can be accessed with
julia> r.cint
1-element Array{Array{Float64,1},1}:
[65.48163774428642, 147.16953608177405]In this example of a stationary model, the function returns a unit dimension vector for the return level and a vector containing only one vector for the confidence interval. The reason is that the function always returns the same type in the stationary and non-stationary case. The function is therefore type-stable allowing better performance of code execution.
To get the scalar return level in the stationary case, the following command can be used:
julia> r.value[]
106.32558691303024To get the scalar confidence interval in the stationary case, the following command can be used:
julia> r.cint[]
2-element Array{Float64,1}:
65.48163774428642
147.16953608177405Probability weighted moments estimation
Probability weighted moments estimation of the GEV parameters can also be performed by using the gevfitpwm function. All the methods also apply to the pwmEVA object.
julia> fm = gpfitpwm(df, :Exceedance)
pwmEVA
model :
ThresholdExceedance
data : Array{Float64,1}[152]
logscale : ϕ ~ 1
shape : ξ ~ 1
θ̂ : [1.9877399514951732, 0.19651587232938317]Bayesian estimation
Bayesian estimation of the GEV parameters can also be performed by using the gevfitbayes function. All the methods also apply to the `BayesianEVA object.
julia> fm = gpfitbayes(df, :Exceedance)
Progress: 12%|████▉ | ETA: 0:00:01[K
Progress: 22%|█████████▎ | ETA: 0:00:01[K
Progress: 36%|██████████████▊ | ETA: 0:00:01[K
Progress: 49%|████████████████████ | ETA: 0:00:00[K
Progress: 62%|█████████████████████████▌ | ETA: 0:00:00[K
Progress: 76%|███████████████████████████████ | ETA: 0:00:00[K
Progress: 86%|███████████████████████████████████▍ | ETA: 0:00:00[K
Progress: 99%|████████████████████████████████████████▊| ETA: 0:00:00[K
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00[K
BayesianEVA
model :
ThresholdExceedance
data : Array{Float64,1}[152]
logscale : ϕ ~ 1
shape : ξ ~ 1
sim :
Mamba.Chains
Iterations : 2001:5000
Thinning interval : 1
Chains : 1
Samples per chain : 3000
Value : Array{Float64,3}[3000,2,1]Model for non-stationary data
Coles(2001, Chapter 6)